Running ~5k Haliotis discus hannah ESTs through CD-HIT-EST



Sequence type     DNA
No. sequences   2809
Longest sequence        1066
Shortest sequence       51
Average length  421
Total letters   1184035
Total N letters 2423
Total non N     1181612
Sequences with N        918


GC content distribution
GC content (%)  No. sequences   % sequences
0%-5%   0       0%
5%-10%  0       0%
10%-15% 0       0%
15%-20% 4       0.14%
20%-25% 9       0.32%
25%-30% 39      1.38%
30%-35% 96      3.41%
35%-40% 220     7.83%
40%-45% 503     17.9%
45%-50% 716     25.48%
50%-55% 828     29.47%
55%-60% 339     12.06%
60%-65% 53      1.88%
65%-70% 2       0.07%
70%-75% 0       0%
75%-80% 0       0%
80%-85% 0       0%
85%-90% 0       0%
90%-95% 0       0%
95%-100%        0       0%


length distribution
length  No. sequences   % sequences
50-99   11      0.39%
100-149 147     5.23%
150-199 178     6.33%
200-249 208     7.4%
250-299 263     9.36%
300-349 267     9.5%
350-399 288     10.25%
400-449 240     8.54%
450-499 253     9%
500-549 208     7.4%
550-599 195     6.94%
600-649 193     6.87%
650-699 177     6.3%
700-749 96      3.41%
750-799 42      1.49%
800-849 38      1.35%
850-899 3       0.1%
900-949 0       0%
950-999 0       0%
1000-1049       1       0.03%
1050-1099       1       0.03%


length distribution
length  No. sequences   % sequences
51-101  17      0.6%
101-151 153     5.44%
152-202 180     6.4%
203-253 216     7.68%
254-304 265     9.43%
305-355 280     9.96%
355-405 296     10.53%
406-456 246     8.75%
457-507 240     8.54%
508-558 196     6.97%
559-609 223     7.93%
609-659 180     6.4%
660-710 161     5.73%
711-761 84      2.99%
762-812 38      1.35%
813-863 30      1.06%
863-913 2       0.07%
914-964 0       0%
965-1015        0       0% 
1016-1066       2       0.07%

1327953746.fasta








Sequence type     DNA
No. sequences   3275
Longest sequence        1066
Shortest sequence       51
Average length  413
Total letters   1355756
Total N letters 2978
Total non N     1352778
Sequences with N        1144


GC content distribution
GC content (%)  No. sequences   % sequences
0%-5%   0       0%
5%-10%  0       0%
10%-15% 0       0%
15%-20% 4       0.12%
20%-25% 8       0.24%
25%-30% 41      1.25%
30%-35% 105     3.2%
35%-40% 246     7.51%
40%-45% 571     17.43%
45%-50% 866     26.44%
50%-55% 975     29.77%
55%-60% 394     12.03%
60%-65% 62      1.89%
65%-70% 3       0.09%
70%-75% 0       0%
75%-80% 0       0%
80%-85% 0       0%
85%-90% 0       0%
90%-95% 0       0%
95%-100%        0       0%


length distribution
length  No. sequences   % sequences
50-99   12      0.36%
100-149 186     5.67%
150-199 214     6.53%
200-249 247     7.54%
250-299 307     9.37%
300-349 323     9.86%
350-399 361     11.02%
400-449 281     8.58%
450-499 284     8.67%
500-549 247     7.54%
550-599 216     6.59%
600-649 218     6.65%
650-699 194     5.92%
700-749 100     3.05%
750-799 42      1.28%
800-849 38      1.16%
850-899 3       0.09%
900-949 0       0%
950-999 0       0%
1000-1049       1       0.03%
1050-1099       1       0.03%


length distribution
length  No. sequences   % sequences
51-101  18      0.54%
101-151 193     5.89%
152-202 217     6.62%
203-253 253     7.72%
254-304 313     9.55%
305-355 341     10.41%
355-405 370     11.29%
406-456 283     8.64%
457-507 271     8.27%
508-558 235     7.17%
559-609 241     7.35%
609-659 209     6.38%
660-710 173     5.28%
711-761 86      2.62%
762-812 38      1.16%
813-863 30      0.91%
863-913 2       0.06%
914-964 0       0%
965-1015        0       0% 
1016-1066       2       0.06%






1327954227.fasta

Blasting on Inquiry…….Job 690
1327954227_blastall_inquiry.txt



Running de novo with CLC 4.0





Assembly with beta ..






using Server to map reads back to get an idea of % mapped.